Avatar database for mobile video communications

ABSTRACT

A method and system for avatar mobile video communications are disclosed. Since the creation and realistic driving of avatars may not be done fully automatically with in a mobile communication device (e.g., a cellular phone), an avatar database is provided along with realistic driving mechanisms. Mobile callers may select appropriate downloadable avatars for using during a mobile video communication. The avatar database is provided as a global resource for the mobile video commutation system.

The present invention relates to the field of mobile videocommunications. More particularly, the invention relates to a method andsystem including a global avatar database for use with a mobile videocommunication network.

Video communication networks have made it possible to exchangeinformation in a virtual environment. One way this is facilitated is bythe use of avatars. An avatar allows a user to communicate and interactwith others in the virtual world.

The avatar can take many different shapes depending the user desires,for example, a talking head, a cartoon, an animal or a three-dimensionalpicture of the user. To other users in the virtual world, the avatar isa graphical representation of the user. The avatar may be used in thevirtual reality when the user controlling the avatar logs on to, orinteracts with, the virtual world, e.g., via a personal computer ormobile telephone.

As mention above, a talking head may be a three-dimensionalrepresentation of a person's head whose lips move in synchronizationwith speech. Talking heads can be used to create an illusion of a visualinterconnection, even though the connection used is a speech channel.

For example, in audio-visual-speech systems, the integration of a“talking head,” can be used for a variety of applications. Suchapplications may include, for example, model-based image compression forvideo telephony, presentations, avatars in virtual meeting rooms,intelligent computer-user interfaces such as e-mail reading and games,and many other operations. An example of such an intelligent userinterface is a mobile video communication system that uses a talkinghead to express transmitted audio messages.

In audio-video systems, audio is processed to get phonemes and timinginformation, which is then passed, to a face animation synthesizer. Theface animation synthesizer uses an appropriate viseme image (from theset of N) to display with the phoneme and morphs from one phoneme toanother. This conveys the appearance of facial movement (e.g., lips)synchronized to the audio. Such conventional systems are described in“Miketalk: A talking facial display based on morphing visemes,” T. Ezzatet al., Proc Computer Animation Conf. pp. 96-102, Philadelphia, Pa.,1998, and “Photo-realistic talking-heads from image samples,” E. Cosattoet al., IEEE Trans. On Multimedia, Vol. 2, No. 3, September 2000.

There are two modeling approaches to animation of facial images: (1)geometry based and (2) image based. Image based systems using photorealistic talking heads have numerous benefits which include a morepersonal user interface, increased intelligibility over other methodssuch as cartoon animation, and increased quality of the voice portion ofsuch systems.

Three-dimensional (3D) modeling techniques can also be used. Such 3Dmodels provide flexibility because the models can be altered toaccommodate different expressions of speech and emotions. Unfortunately,these 3D models are usually not suitable for automatic realization by acomputer system. The programming complexities of 3D modeling areincreasing as present models are enhanced to facilitate greater realism.In such 3D modeling techniques, the number of polygons used to generate3D synthesized scenes has grown exponentially. This greatly increasesthe memory requirements and computer processing power. Accordingly, 3Dmodeling techniques generally cannot be implemented in devices such ascellular telephones.

Presently, 2D avatars are used for application like Internet chattingand video-e-mail applications. Conventional systems like CrazyTalk andFaceMail combine text to speech applications with avatar driving. A usercan choose one of a number of existing avatars or provide his own andadjust face feature points to his own avatar. When text is entered, theavatar will mimic talking which corresponds to the text. However, thissimple 2D avatar model does not produce realistic video sequences.

In order to create 3D avatar models, as described above, typicallyrequires a complicate and interactive technique that too difficult foran average user.

Accordingly, an object of the invention is to provide a business modelfor avatar based real-time video mobile communications.

Another object of the invention is to provide a global recourse databaseof avatars for use with mobile video communication.

One embodiment of the present invention is directed to a videocommunication system including a mobile communication network, a mobilecommunication device including a display that is capable of exchanginginformation with another communication device via the mobilecommunication network, and a database including a plurality of avatars.The database is a global resource for the mobile communication network.The mobile communication device can access at least one of the pluralityof avatars.

Another embodiment of the present invention is directed to a method forusing an avatar for mobile video communication. The method includes thesteps of initiating a video communication by a mobile communicationdevice user to another video communication device user, accessing aglobal resource database including a plurality of avatars and selectingone avatar of the plurality of avatars in the database. The method alsoincludes the step of sending the one avatar to the another videocommutation device user.

Still further features and aspects of the present invention and variousadvantages thereof will be more apparent from the accompanying drawingsand the following detailed description of the preferred embodiments.

FIG. 1 shows a conceptual diagram of a system in which a preferredembodiment of the present invention can be implemented.

FIG. 2 is a flowchart showing a method in accordance with a preferredembodiment of the invention.

In the following description, for purposes of explanation rather thanlimitation, specific details are set forth such as the particulararchitecture, interfaces, techniques, etc., in order to provide athorough understanding of the present invention. However, it will beapparent to those skilled in the art that the present invention may bepracticed in other embodiments, which depart from these specificdetails. Moreover, for purposes of simplicity and clarity, detaileddescriptions of well-known devices, circuits, and methods are omitted soas not to obscure the description of the present invention withunnecessary detail.

In FIG. 1, a general view of a mobile communication system 10 is shown.The network includes mobile stations (MS) 20, which can connect todifferent base station subsystems 30. The base stations (BS) 30 areinterconnected by means of a network 40. The network 40 may be a widearea network, such as the public telephone network/cellular switchnetwork, or an Internet router network that routes TCP/IP datagrams.

A variety of service nodes 50 can also be connected via the network 40.As shown, one such service that can be provided is a service for videocommunications. Service node 50 is configured to provide such videocommunications and is connected to the network 40 as a global resource.

Each MS 20 includes conventional mobile transmission/reception equipmentto enable identification of a subscriber and to facilitate callcompletion. For example, when a caller attempts to place a cell, i.e.,in an area covered by the BS 30 of the network 40, the MS 20 and BS 30exchange caller information between each other. At this time a list ofsupported or subscribed services may also exchanged via the network 40.For example, the caller may subscribe to mobile video communications viaa mobile telephone 60 with a display 61.

However, as discussed above, for the caller, it may be a majordifficulty to create an avatar 70 for use with such mobile videocommutations. One embodiment of the present invention is directed to adatabase 80 of avatars stored in the service note 50 that the caller canaccess and download as needed. The driving mechanism for the avatar 70to realistically mimic speech is also provided to the caller.

The database 80 may include a variety of different types of avatars 70,e.g., two-dimensional, three-dimensional, cartoon-like, and geometry- orimage-based.

It is also noted that the service node 50 is a global resource for allthe BS 30 and the MS 20. Accordingly, each BS 30 and/or MS 20 is notrequired to store any avatar information independently. This allows fora central point of access for all avatars 70 for update, maintenance andcontrol. A plurality of linked service nodes 70 may also be providedeach with a subset all the avatars 60. In such an arrangement, oneservice node 70 can access data in another service node 70 as needed tofacilitate a mobile video communication call.

The database 80 (DB) contains at least an animation library and acoarticulation library. The data in one library may be used to extractsamples from the other. For instance, the service node 50 may use dataextracted from the coarticulation library to select appropriate frameparameters from the animation library to be provided to the caller.

It is also noted that coarticulation is also performed. The purpose ofthe coarticulation is to accommodate effects of coarticulation in theultimate synthesized output. The principle of coarticulation recognizesthat the mouth shape corresponding to a phoneme depends not only on thespoken phoneme itself, but also on the phonemes spoken before (andsometimes after) the instant phoneme. An animation method that does notaccount for coarticulation effects would be perceived as artificial toan observer because mouth shapes may be used in conjunction with aphoneme spoken in a context inconsistent with the use of those shapes.

The service note 50 may also contain animation-synthesis software suchas image-based synthesis software. In this embodiment, a customizedavatar may be created for the caller. This would typically be done priorto attempting to place a mobile call to another party.

To create a customized avatar, at least samples of movements and imagesof the caller are captured while a subject is speaking naturally. Thismay be done via a video input interface within a mobile telephone oraudio-image data may be captured in other ways (e.g., via a personalcomputer) and downloaded to the service node 50. The samples capture thecharacteristics of a talking person, such as the sound he or sheproduces when speaking a particular phoneme, the shape his or her mouthforms, and the manner in which he or she articulates transitions betweenphonemes. The image samples are processed and stored in the animationlibrary of the service node 50.

In another embodiment, the caller may already have a particular avatarthat can be provided (uploaded) to the service node 50 for future use.

FIG. 2 shows a flowchart showing access and use of the avatar database80. In step 100, the caller initiates a mobile telephone call.Information is then exchanged between the MS 20 and the BS 30identifying the caller as a subscriber of the system 10, as well asdetermining what services the caller may use. It is noted that thecaller may also be identified based upon the unique number associatedwith the mobile telephone 60.

The avatar database 80 is then accessed in Step 110.

If the caller subscribes to a video communications service, the callerthen may have the option of selecting (in step 121) an avatar 70 fromthe database 80. The caller may have a pre-selected default avatar foruse with all calls or have different avatars associated with differentparties to be called. For example, a particular avatar may be associatedwith each pre-programmed speed dial number the caller has programmed.

Once the appropriate avatar 70 is determined (step 120), the servicenode 50 downloads the avatar 70 in step 130. This avatar is sent to theparty to be called as part of the call set-up procedure. This may beperformed in a manner similar to the transmission of caller-id typeinformation.

At this time, the service node 50 may also determine that the party tobe called has a default avatar to be used for the caller. Once again,the party to be called may have a predetermined default avatar 60 foruse with all calls or the default avatar 60 may be based upon apredetermined association (e.g., based upon the caller' telephonenumber). The predetermined default avatar is sent the caller. If nodefault avatar can be determined for the party to be called, thenanother predetermined system default avatar can be sent to the caller.

In step 140, as the call is established and continues, various (e.g.,face) parameters of the caller and the party to be called are accessedin the database 80 and sent to the parties to ensure that the avatar 60is mimicking the received speech and facial expressions accordingly.

During the call (step 150), the caller and/or the party to be called maydynamically change the avatar 60 currently be used.

Various functional operations associated with the system 10 may beimplemented in whole or in part in one or more software programs storedin a memory and executed by a processor (e.g., in the MS 20, BS 30 orservice node 50).

While the present invention has been described above in terms ofspecific embodiments, it is to be understood that the invention is notintended to be confined or limited to the embodiments disclosed herein.On the contrary, the present invention is intended to cover variousstructures and modifications thereof included within the spirit andscope of the appended claims.

1. A video communication system (10) comprising: a mobile communicationnetwork (20,30); a mobile communication device (60) including a display(61) that is capable of exchanging information with anothercommunication device via the mobile communication network; and adatabase (80) including a plurality of avatars (70), the database beinga global resource for the mobile communication network, wherein themobile communication device can access at least one of the plurality ofavatars.
 2. The video communication system (10) according to claim 1,wherein mobile communication network is a cellular network including aplurality of mobile stations (20) and at least one base station (30). 3.The video communication system (10) according to claim 2, wherein themobile communication device is a cellular telephone (60).
 4. The videocommunication system (10) according to claim 1, wherein the plurality ofavatars include at least one three-dimensional representation of a humanhead.
 5. The video communication system (10) according to claim 1,wherein the plurality of avatars include at least one two-dimensionalrepresentation of a human head (70).
 6. The video communication system(10) according to claim 1, wherein the plurality of avatars include atleast one image-based representation of a human head (70).
 7. The videocommunication system (10) according to claim 1, wherein the mobilecommunication device (60) further includes a video input interface. 8.The video communication system (10) according to claim 1, wherein thedatabase (80) is part of a video service node (50) that iscommunicatively connected to the mobile communication network.
 9. Thevideo communication system (10) according to claim 8, wherein the videoservice node (50) further includes animation-synthesis software to allowa subscriber of the video communication system to create a customizedavatar.
 10. A method (FIG. 2) for using an avatar for mobile videocommunication, the method comprising the steps of: initiating a videocommunication by a mobile communication device user to another videocommunication device user; accessing a global resource databaseincluding a plurality of avatars; selecting one avatar of the pluralityof avatars in the database; and sending the one avatar to the anothervideo commutation device user.
 11. The method according to claim 10,wherein the mobile communication device is a cellular telephone.
 12. Themethod according to claim 10, wherein the plurality of avatars includeat least one three-dimensional representation of a human head
 13. Themethod according to claim 10, wherein the plurality of avatars includeat least one two-dimensional representation of a human head
 14. Themethod according to claim 10, wherein the plurality of avatars includeat least one image-based representation of a human head.
 15. The methodaccording to claim 10, further comprising the step of allowing mobilecommunication device user to create a customized avatar by providingvideo information.
 16. The method according to claim 10, wherein theselection step includes using a predetermined default avatar.
 17. Themethod according to claim 16, wherein at least two differentpredetermined default avatars are used with two video communicationdevice user to be called.
 18. The method according to claim 10, furthercomprising the step of sending a predetermined avatar to the mobilecommunication device user.